A Formal Comparison of Visual Web Wrapper Generators
نویسندگان
چکیده
We study the core fragment of the Elog wrapping language used in the Lixto system (a visual wrapper generator) and formally compare Elog to other wrapping languages proposed in the literature.
منابع مشابه
Data Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملChapter 3 . 24 XWRAPComposer : A Multi - Page Data Extraction Service
We present a service-oriented architecture and a set of techniques for developing wrapper code generators, including the methodology of designing an effective wrapper program construction facility and a concrete implementation, called XWRAPComposer. Our wrapper generation framework has two unique design goals. First, we explicitly separate tasks of building wrappers that are specific to a Web s...
متن کاملA Multi-Page Data Extraction Service
We present a service-oriented architecture and a set of techniques for developing wrapper code generators, including the methodology of designing an effective wrapper program construction facility and a concrete implementation, called XWRAPComposer. Our wrapper generation framework has two unique design goals. First, we explicitly separate tasks of building wrappers that are specific to a Web s...
متن کاملIntelligent Wrapping of Information Sources: Getting Ready for the Electronic Market
Literature search and delivery in the World Wide Web becomes a rapidly expanding market. Up to now the search is mostly cost-free. But in the future we expect the appearance of more and more providers charging for their services. The main problems are finding the right provider and extracting the information. UniCats is a system for intelligent information search and extraction from multiple pr...
متن کاملMNTG: An Extensible Web-Based Traffic Generator
Road network traffic datasets have attracted significant attention in the past decade. For instance, in spatio-temporal databases area, researchers harness road network traffic data to evaluate and validate their research. Collecting real traffic datasets is tedious as it usually takes a significant amount of time and effort. Alternatively, many researchers opt to generate synthetic traffic dat...
متن کامل